Efficient Hybrid Execution of C++ Applications using Intel(R) Xeon Phi(TM) Coprocessor

نویسندگان

  • Jirí Dokulil
  • Enes Bajrovic
  • Siegfried Benkner
  • Sabri Pllana
  • Martin Sandrieser
  • Beverly Bachmayer
چکیده

The introduction of Intel R © Xeon Phi TM coprocessors opened up new possibilities in development of highly parallel applications. The familiarity and flexibility of the architecture together with compiler support integrated into the Intel C++ Composer XE allows the developers to use familiar programming paradigms and techniques, which are usually not suitable for other accelerated systems. It is now easy to use complex C++ template-heavy codes on the coprocessor, including for example the Intel Threading Building Blocks (TBB) parallelization library. These techniques are not only possible, but usually efficient as well, since host and coprocessor are of the same architectural family, making optimization techniques designed for the Xeon CPU also beneficial on Xeon Phi. As a result, highly optimized Xeon codes (like the TBB library) work well on both. In this paper we present a new parallel library construct, which makes it easy to apply a function to every member of an array in parallel, dynamically distributing the work between the host CPUs and one or more coprocessor cards. We describe the associated runtime support and use a physical simulation example to demonstrate that our library construct can be used to quickly create a C++ application that will significantly benefit from hybrid execution, simultaneously exploiting CPU cores and coprocessor cores. Experimental results show that one optimized source code is sufficient to make the host and the coprocessors run efficiently.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

coprocessors with a basic N-body simulation

Intel R © Xeon Phi TM coprocessors are capable of delivering more performance and better energy efficiency than Intel R © Xeon R © processors for certain parallel applications. In this paper, we investigate the porting and optimization of a test problem for the Intel Xeon Phi coprocessor. The test problem is a basic N-body simulation, which is the foundation of a number of applications in compu...

متن کامل

Exploring SIMD for Molecular Dynamics, Using Intel

We analyse gather-scatter performance bottlenecks in molecular dynamics codes and the challenges that they pose for obtaining benefits from SIMD execution. This analysis informs a number of novel code-level and algorithmic improvements to Sandia’s miniMD benchmark, which we demonstrate using three SIMD widths (128-, 256and 512bit). The applicability of these optimisations to wider SIMD is discu...

متن کامل

Cluster-level tuning of a shallow water equation solver on the Intel MIC architecture

The paper demonstrates the optimization of the execution environment of a hybrid OpenMP+MPI computational fluid dynamics code (shallow water equation solver) on a cluster enabled with Intel Xeon Phi coprocessors. The discussion includes: 1. Controlling the number and affinity of OpenMP threads to optimize access to memory bandwidth; 2. Tuning the inter-operation of OpenMP and MPI to partition t...

متن کامل

Coprocessors: An Early Performance Comparison

The demand for more and more compute power is growing rapidly in many fields of research. Accelerators, like GPUs, are one way to fulfill these requirements, but they often require a laborious rewrite of the application using special programming paradigms like CUDA or OpenCL. The Intel R © Xeon Phi TM coprocessor is based on the Intel R © Many Integrated Core Architecture and can be programmed ...

متن کامل

High Order Seismic Simulations on the Intel Xeon Phi Processor (Knights Landing)

We present a holistic optimization of the ADER-DG finite element software SeisSol targeting the Intel © Xeon Phi TM x200 processor, codenamed Knights Landing (KNL). SeisSol is a multi-physics software package performing earthquake simulations by coupling seismic wave propagation and the rupture process. The code was shown to scale beyond 1.5 million cores and achieved petascale performance when...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1211.5530  شماره 

صفحات  -

تاریخ انتشار 2012